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8. ASYMPTOTIC BEHAVIOR 

In’ this section, we shall study the asymptotic behavior ‘of the identi- 
fication equations. The results will allow us to consider the problem of 
controlling the system & over an infinite time interval (N ■+ «) . 

The main theoretical results will be stated; the proofs are given in 
Appendix C. 

Definition 8.1 : { (A(k) , C(k))}“ =0 is said to be completely observable 

of index v at k if the observation matrix 


-a c (k,v) “ te f(k) : iA (k,k) ~ (k + + v - 2,10c' (k + v - dj 

( 8 . 1 ) 

is of full rank n. ( (A(k) ,£(k) ) }“ =0 is said to be uniformly completely, 
observable of index v if the pair is completely observable of index v for 
all k - 0, 1, ... . 

Theorem 8.2 : Let { (A(k) ,C(k) ) }*_ Q be uniformly completely observable of 

index v, and suppose that A(k) , £(k) are nonsingular, k - 0, 1, ... . If 

u(k) 0, k = 0, 1 then { (A(k,u(k)) ,C(k) is uniformly completely 

observable of index v', v' 5 2v. 

Corollary 8.3 : Let A(k) , G(k) be bounded and nonsingular. If 

{(A(k),£(k))}^_ 0 is uniformly completely observable of ir.dex v, the error 
covariance matrix, K_(k|k,U(0,k-l) ) which satisfies (4.21) to (4.23), will 
remain bounded for all k - 0, 1, ... where u(k) is any bounded but nonzero 
control for all k * 0, 1 
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Lemma 8.4; Suppose that G(k) satisfies 


G(k)B^G* (k) <B ; BeM , B > 0 . (8.2) 

nn — — 

Let Q> i.e., there is nc driving noise in the gain dynamics.. then for 

any control sequence, we have 

^(fchllk+l.U^.k)) < ^ (k[k,UC0,k)) . (8.3) 

We remark that Eq* (8.2) holds when G(k) * I for 
all k, i.e., the unknown parameter vector b is constant. 

An Immediate consequence of lemma 8. A is that if (8.2) is true and 
X(k) = £, then there exists such that 

ita £ (k|k,U(0,k - X)) = u (8.4) 

Note that (8.4) is true independent of the observability of ( (A(k) ,£(k) ) }^ =Q . 
In the following theorem, we shall give sufficient conditions under which 

h 5 a- 

Theorem 8,5 . (Main result) -.Let x (k) = 0, A(k) , G(k) be bounded and nonsingular and 

£(k) satisfies (8.2) ) k = 0, 1, . . . . If { (A(k) ,C (k ))) kas Q uniformly 

completely observable of index v and u(k) is any bounded but nonzero control 
for k * 0, 1, ... s then 

lia t. (k|k,U(0,k - 1)) * 0 . (8.5) 

k-~> ^ 

Theorem 8.5 can be extended to the case where u(k) is bounded but 
ssosizero ccrntroX for all but a finite number of k s. Since £(k jk,U(0,k _ l) ) - 0.» 
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(8.5) also implies 
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control will bo kept: hounded away from zero as long as exact identification 
of b(k) has not been obtained. Using th eorem. 8_ . 5 , we predict that f or 

systems r he estimated parameters of b (k) will converge to t h e ^rue 
gain parameters before the co ntrolm agnitude go e s j i o^ero. This is also 
tome out by the simulation results. 


Analytical studies of the convergence rate of the O.L.F.O. system are 

a- .h. »• ■» 

convergence-rate for stable system will be very slow. 


Finally, we shall discuss some interesting implications of theorem 
8.5. . consider an observable system g, (2.1), , with unknown gain 

parameters satisfying (8.7) ) and with G(k) satisfies (8.6).. )• Let 

l k (S(k|k,£(k|k),Z ij (k|k,U(0 > k-l)) be any ad-hoc control law which is 
"placed" after the identifier and with the following properties (k >0).; 

1) 4(%-,*): R” x B? » - R 

2) 4(x,b,I> * 0, x E R n , b t R n . I e M nn , * 4 0 i Z_ * 0 

3) ^(x.h.Q) = -(h(k)+b'(k)K(k+l)b(k)) _1 b’(k)K(k+l)A(k)x ; 

r e R a , b_ c R n 

From condition 2, we see that ^(klk.UCO.k-l)) - 0 as k * » and 

so from condition 3, the ad-hoc control scheme will converge to the optimal 
control strategy when the full dynamics become known. This indicates that 
the ad-hoc scheme 4 (x(k|k),b(klk ), 4 (k|k,U(0,k-l)) can provide reasonable 
results. 


9. REMARKS 

Vector Control 

In our investigation, we assumed that the control is scalar. However, 
the approach can be extended in a straightforward conceptual manner to the 
vector control case. First, a set of identification equations is derived 
which will generate che estimate of the current state, the current estimate 
of the unknown gain matrix and the different cross-error-covariance matrices. 

An open-loop control problem is formulated as in Section 4, equations (4.20) 
to (4.31) and discrete matrix minimum principle is used to obtain the extremal 
solution. The results will be similar to those of scalar control case. How- 
ever, the equations in the vector control case will look and be more complicated. 

Control Over Infinite Interval 

Let us consider the problem of controlling the system S, which is time 
invariant and with an unknown constant gain vector b_, over an infinite interval, 
i.e., N To obtain a feasible solution we suggest the w indow-shi f t ing 

approach. Assume that at all times, we have N more steps to control, thus at 
all times we solve an open— loop control problem over an interval of N steps. 

This approach is motivated by computational considerations and the theoretical 
results derived in Section 8. 

We note that in the O.L.F.O. approach, we have to re-solve the open-loop 
control problem at every time k so as to adjust the control scheme accordingly 
In our case, we have to compute l£(k|k) in a backward direction starring from 
the terminal time N to k for each k. If N is very large, this requires a 
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j 

large computation time. From a computational standpoint, we would like j 

to "cut back" the terminal time. Conceptually, in trying to control 

over an infinite time period, the controller looks into all future effects 

caused by present action, and decides on the optimum move for the next 

ste P* The win dow-shif ting approach suggests that instead of looking at all 

future effects * the controller looks at only near future effects caused by 

present actions and. decides on suboptimal moves. One may view such an 

approach as a "short: term adaptive scheme." Note also that we can adjust the 
our 

"window width" according to* computational capability. At all times, we need 
only to salve far K_(k|k) in a backward direction starting from N + k to k. 
Thus, from a conceptual and a computational point of view, such an approach 
may be desirable.. 

Assume that the time invariant system S being controlled is observable 
and controllable.. If .b is known exactly, then if we. consider control over an 
infinite time period, the optimal feedback gain is constant and is given by 

£ - -(h + b’K b)~\'K A (9.1) 

where K is given by the steady state solution of 

- A' (K, - K.b(h + b'K.b) _1 b'K,)A + V ; K - F (9.2) 

Let N be the integer such chat for n > N, 

l|K,-K n _ 1 ll s * 5 c > 0 • (9.3) 

Such an integer N can be found experimentally off-line. Adjust the window 
width equal to N, and apply the window-shifting approach. 
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By theorem 8.5, the estimate in 

Jb will converge asymptotically, and so when b_(kjk,U (0,k - 1) ■+■ b_, we have 

fc(k,N+k;F) : 0 1 

K(k|k) - ... ... 

. o : oj 

where K(k,N +• fc;.;F) satisfies 

K(k,N+k;F) - A r (K(k+l,N+k;F) - K(k+l,N+k;F)b (h + b’KCk+l^+kjF)^)" 1 * 

h^Ck+l.N+kjF))^, + W ; K(N+k,N+k;F) « F < 9 - 4 ) 

and 

u*(k|k) "•■lOO^’Ck k) * -(h + b , K(k,N+k;F)b)“ 1 b , K(k > N+k;F)A x°(k|k) (9.5) 

(See discussion in Section 6.) Comparing (9.2) and (9.4), we note that 

K(k,N+k;F) - - K (9.6) 

Thus asymptotically, the time varying adaptive system tends to be a time 
invariant control system. 
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10. NUMERICAL EXAMPLES 

In the previous sections, we have studied theoretically the adaptive 
control of a discrete time linear system with an unknown gain vector. An 
adaptive system was derived using the O.L.F.O. approach, and the asymptotic 
behavior c* the control system was discussed. There are still some important 
questions which have not been treated theoretically. For example, rates of 
convergence are, in general, of great interest, but this topic was not 
treated in detail. In this section we present simulation studies carried 
out for some specific third order systems. The main purpose for these 
studies is to provide quantitative results about rates of convergence and 
to test the validity of the qualitative conclusions of Section 6. 

To enhance physical intuition, the discrete time systems were obtained 
by sampling a continuous-time system. In this case, the uncertainty of the 
b_(k) vector is equivalent to uncertainty as 

(a) To the number of zeros, 

(b) The location of the zeroes in the S-plane, and 

(c) The plant DC gain. 

It is assumed that the pole locations are known. 

Let us consider a stochastic continuous time- invariant linear system 
described by: 

2f(t) ” AfXfW + bjUfOO + ; x(0) -V- <J(0, S Jto ) 

( 10 . 1 ) 

Xf(t) = c’x^t) + n f (t) bj ^ <5(0, E bo ) 

where C^Ct) is a scalar driving white Gaussian noise, n^Ct) is t ^ ie 
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Defining 

A A 

y(k> - y f (kA) ) nCk) = n f (kA) (10. 8) j 

I 

the observation sequence is 

y(k) 3 t/xCk) + rtOO (10.9) 

The statistical laws of £(k), nOO are 

£(k) 'v QCO, rA) (10.10) 

n(k) ^ Q(0, qA) (10.11) 

The gain vector is assumed to be unknown but constant, therefore the 
equation for the unknown gain is 

b(k+l) = b(k) ; b(0) *u QCb^, Z bo ) (10.15 

We can now apply the results of previous sections to equations (10.6), 

(10.9), (10.11), and (10.12). 

A computer program was designed which operates as follows: 

(1) Read in A-, bj,, c, d-, r, q, , x , b , the sampling interval 

— t r — “t — o — o 

A, the final time N and the different weightings W, h, _F, 

and covariances Z , Z v . 

-xo -Hjo 

(2) A subroutine, which was developed by Levis , was used to 
convert the continuous version, (10.1), to the discrete time 
sample data version (10.6). The covariances of £(k), n(k) 
are computed using (10.11), (10.12). 
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( 3 ) The true value of x(k) was recorded. Using a noise generating 
subroutine, a sample value of y(k) was obtained. Assume that 
a0b-I/k- 1)., 6_(k-l/k-l) are recorded. A subroutine for the 
identf fffcra tion equations (4. 19) -(A . 23) was used to obtain the 
tturteul estimates *(k/k) , S(k/k), and the error covariance 
matrix jr(k/k) recursively. These values were also recorded. 

( 4 ) A subroutine based on (5.1) - (5.10) was used to obtain the 

* 

adaptive control u (k) . 

(5) The control u*(k) was applied to the system (L0.6), using a 
noise generating device to obtain a sample value of £(k); then 
by (10.6), we obtained the value x(k+l) . 

(6) We advance k k+1 and repeat (3) through (5) until we get to 
the final time k =* N-l. 

The program was written in such a way that if we set b/k/k) * b_, and 

* .0, then the procedures (3) through (6) will give us the truly optimal 

stochastic control when Jb is known. Using a plotting subroutine we can 

plot out the truly optimal trajectories vs. the O.L.F.O. trajectories; the 

true b_ vs* the estimated and optimal feedback gain vs. adaptive gain 

control 

(it was noted that the adaptiveacorrection term will converge to zero quite 
fast), under the requirement that the same noise samples (£(k), n(k)) were 
used for hath, the* known b_ fl and 'unknown cases. These plots provide us 
with qualitative understanding on the rate of convergence of the overall 
suboptimal Q..L.F.0. control system, and the effects of unknown gains. 
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ln an the computer simulations, unless otherwise mentioned, we set 


the values: 


& a* Q..Z sec.,. r * 0.05, 



■ly 




iw 


4 I,, 


II 0 0] 


(10.13) 


It is important to realize then that we deal with a third order 
system. The only measurement is that of the output, every 0.2 seconds 
This sampled-data measurement is corrupted by white noise whose 
variance is f = (0.45X0.2) - 0.09 (or R.M.S. value 0.3). The 
plant may have none, one, or two zeroes. We do not know how many 
there are or where. Hence, even though the poles are assumed known, 
the measurements are extremely meager since from the noisy measure- 
ment of one variable, one must estimate six (three state variables 
and three parameters that define the number and location of zeros). 
Furthermore, the open loop plant may be unstable. 


Example 1: Testable System 

It is assumed that the continuous time system is described by 



(10.14) 


such a system has aA transfer function (see Fig. 2) 

• t V (s + 3) (s + 2) 

H, (s) = 2 : 

^ (s - 1) (s^ + 2s + s) 


(10.15) 
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POLE ZERO PATTERN FOR EXAMPLE 1: 
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POLE ZERO PATTERN FOR EXAMPLE 2 


Note that it has an unstable pole at s 85 1. Initially, we set 


^(0/0) 


"0 

0 


(10.16) 


FIGURE 2 
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: STABLE SYSTEM 


13 



i.e, we start with an initial guess that the system hasno zeroes. The 
final time is N = 40 (or 8 seconds). 


Many computer runs have been made on the same system with different 
noise samples. The plots for one particular sample experiment, which 
represents a fairly average behavior, are shown in Figs. 3, 4, and 5. 

From the simulation data (which is not shown completely), we can obtain 
a rough idea about the behavior of the suboptimal O.L.F.O. control system. 

From the simulations, it was found that in the beginning, the O.L.F.O, 
adaptive gain is approximately zero (Fig. 5) and the O.L.F.O. trajectory 
follows closely to the input-free trajectory (Fig. 3). 

This agrees with the discussion in Section 6 regarding the effect of 
having an unstable system during the initial measurements (h very large) 
in which little control is applied. The diverging phenomenon, due to the 
plant instability, is detected by the identifier; controls of considerably 
high magnitude are then applied for a few steps . This is indicated by the 
fact that there are sharp jumps in the state trajectories. The simulation 
showed that these jumps are not caused by bad noise samples, because the 
same phenomenon appears in different sample runs at approximately the same 
time interval. The high magnitude control serves mainly for identification 
purposes ; this is revealed by the fact that at the next time unit, the 
estimate of b^ closely agrees with the true b^ (Fig. 4). As was pre- 
dicted in Section 7, the O.L.F.O. adaptive gains do converge to the truly 
optimum gains (Fig. 4). The correction term vs. time is not shown in the 
figure, but simulation results indicate that the correction term goes to 


zero very rapidly after the identification of Is is essentially complete. 
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COMPARISON BETWEEN THE OPTIMAL TRAJECTORY WHEN THE GAIN IS KNOWN 
AND THE O.L.F.O. TRAJECTORY ASSUMING THE GAIN IS UNKNOWN. THE SYSTEM 


BEING CONTROLLED IS UNSTABLE WITH SYSTEM FUNCTION 
THE SAMPLE NOISE IS THE SAME FOR BOTH CASES. 
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TIME UNIT «* 


Fig* 4 ESTIMATE OF THE UNKNOWN GAIN VECTOR. THE SYSTEM BEING 

CONSIDERED HAS SYSTEM FUNCTION (S*3) (S+2) _ 

(S-l) (S^2S+5) 


0.35 


a3 



Qi 


0 





Fig. 4 (Continued) 
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Fig. 5 COMPARISON BETWEEN THE OPTIMAL FEEDBACK GAINS AND THE 

ADAPTIVE O.L.F.O. GAINS. THE SYSTEM BEING CONSIDERED HAS 
SYSTEM FUNCTION fS+3) (S+2) __ 

(S-l) (S2+2S+5) 
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TIME UNIT 


Fig. 5 (Continued) 
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Fig. 5 


(Continued) 






GRAPH A 


' OPTIMAL TRAJECTORY WHEN jj IS KNOWN 
* O.L.F.O. TRAJECTORY WHEN £ IS UNKNOWN 


COMPARISON OP OPTIMAL TRAJECTORY WHEN 
THE GAIN VECTOR IS KNOWN AND THE O.L.F.O 
TRAJECTORY ASSUMING THE GAIN VECTOR IS 
NOT KNOWN. THE SYSTEM BEING CONTROLLED 
IS STABLE AND HAS SYSTEM FUNCTION 
(S+3) CS+2) 

(S+l) (S2+2S+5)’ 


WE GUESS INITIALLY 


THAT T HE ZER OES ARE LOCATED AT 
-7/4±/~39/4. THE NOISE SAMPLE IS THE 
SAME FOR BOTH CASES. 



Fig. 6 


(Continued) 




Another set of simulation experiments was carried out where we kept 
the same sample noise but varied the weighting h, (h > 0). It was found 
from the experiments (not reported In here) that the maximum magnitude of 
the overshoot in the O.L.F.O. trajectories varied inversely with the value 
of h; If II was large, we have relatively ’’lower" overshoots; whereas „ if 
h was small, we had relatively high overshoots. Also, the experiments seem 
to indicate that the convergence rate and the final estimation error in jb 
seem to depend on the value of h we chose; with large h, we have relative- 
ly slow convergence rate and relatively big final estimation error in b^; 
if h is small, we have a relatively fast convergence rate and relatively 
small final, estimation error in b_. 

In the next set of experiments, we kept the weighting fixed (h = 0.1), 
repeated the first set of experiments with larger driving noise co- 
variance (r = (I..45) while using the same observation noise sample. The 
experimental results (not reported in here) seem to indicate that the in- 
crease in driving noise covariance has little effect on the convergence 
rate of the QH.-F.0. control system. 

It is of interest to find out whether the initial guess on b f will 
be sensitive ter the resulting O.L.F.O. control system. We carried out a 
set of experiments where we fixed 

, 4,-gj 

Thus the true transfer function is 

HTfe) 

(s - 1) (s* + 2s + s) 


I 


(10.17) 


(10.18) 
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The initial condition on x^(0) was kept fixed, and using the same sample 
noise, we varied our initial guess in b^. The same runs seem to indicate 
that though the sample O.L.F.O. trajectory varied with different initial 
crues s es in b^; the convergence rate was quite insensitive to the guess in 

_k: 

Example 2: Stable System 

It is assumed that 


k 





(10.19 


The true transfer function for the system is (Fig. 2). 


H 2 (s) 


(s + 3Ks + 2) 

(s + 1) (s 2 + 2s + s) 


The system is stable. 

In the first set of experiments, we initially guess 


(10. 2Q 


jb^CO/O) 



( 10 . 21 ) 


i.e, that the zeroes are located at - t- + and “ J " * The 

weighting cn the control is h - 1. We take the final time H - 40. 

Sample runs for the same system with same initial guess (10.21) 
were made and the plots for one particular sample are shown in Figs. 

6, 7, 7. As opposed to the unstable case, the O.L.F.O. adaptive gain 
is some nonzero vector, and so the value of the O.L.F.O. control is not 
zero at the beginning (Fig. 8). This confirms the remarks made in 
Section 7. The control is used both for identification and 


control purposes. The system is stable, and since no large magnitude 
control is applied, the O.L.F.O. trajectory decays down to zero (see 
Fig. 6). This decaying phenomenon is noticed by the identifier, and 
thus the control is kept near zero to save energy. Therefore, after a 
certain time interval, when the O.L.F.O. trajectory goes near the origin, 
the O.L.F.O. control will remain 2aro for most of the time. The system 
behaves almost like an input-free system. In fact, this is also what the 
truly optimum system will do. .We note from Fig. 7 that the identification 
process of the unknown gain ]> stops at about k ** 20, which is the 
approximate time unit when the O.L.F.O. state trajectory begins to stay 
around zero. If we consider control over an infinite interval (say using a 
window-shifting approach) we may expect awfully slow convergence rate in the 
estimation of b. to the true b_, and a slow convergence rate of O.L.F.O. 
control system to truly optimum control system. 

In the second set of experiments, we used the same noise samples as 
before but starting with the initial condition 



The initial guess on was 

^ (o/o) 



(10.23) 


i.e. the plant had no zeroes. The weighting on the control is h - 1, and 
we take the final time N * 60. The plots for one typical sample experi- 
ment are shown in Figs. 9, 10, 11. (The sample noise for the sample run 
shown in Figs. 9, 10, 11 is the same as that shown in Figs. 6,7,8. Comparing 


29 


30 



t4s»c) 


TIME UNIT 


ESTIMATE OF GAIN VECTOR. THE SYSTEM BEING CONSIDERED HAS SYSTEM 
FUNCTION ( S+3) (S±2) . WE GUESS INITIALLY THAT THE ZEROES ARE 

(S+l) (S2+2S+5) 

LOCATED AT -7/4 ±v^W5\ 
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Fig. 8 COMPARISON BETWEEN OPTIMAL FEEDBACK GAIN AND O.L.F.O. ADAPTIVE GAIN, 

THE SYSTEM BEING CONSIDERED HAS SYSTEM FUNCTION __(S±3),_(S ± 2) . WI 

(S+l) (S2+2S+5) 

GUESS INITIALLY THAT THE ZEROES ARE LOCATED AT -7/4±/-39/4 




Fig. 8 (Continued) 




Fig- 9 COMPARISON BETWEEN OPTIMAL TRAJECTORY WHEN THE GAIN VECTOR IS KNOWN AND 
THE O.L.F.O. TRAJECTORY ASSUMING THE GAIN VECTOR IS UNKNOWN. THE SYSTEM 
BEING CONTROLLED HAS SYSTEM FUNCTION (S+3) (S+2) . WE GUESS INITIALLY 

(S+l) (S2+2S+3) 

THAT THERE ARE NO Z EROES. THE NOISE SAMPLE IS -THE SAME FOR BOTH CASES. 



Fig. 9 (Continued) 
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Fig. 9 (Continued) 
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Fig. 10 ESTIMATE OF THE GAIN VECTOR. THE SYSTEM BEING CONSIDERED 
HAS SYSTEM FUNCTION (S+3) (S+2) . THE INITIAL GUESS IS 

(S+l) (S2+2S+3) 


THAT THERE ARE NO ZEROES. 



b(2) 



Fig. 10 (Continued) 






Fig. 11 


COMPARISON BETWEEN OPTIMAL TO FEEDBACK GAIN AND O.L.F.Q. ADAPTIVE 
GAIN. TEE SYSTEM BEING CONSIDERED IS (S+3) (S+2) . THE INITIAL 

(S41) (si+2S+2j 

GUESS IS THAT THERE ARE NO ZEROES. 



Fig. 11 (Continued) 
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Fig. 11 (Continued) 
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this set of experiments with the last, we note that more or less the same 
phenomenon occurred in both sets of experiments. The final estimate in _b 

A A 

is way off its true value, in fact b ^Qe/k) and b ? (k/k) are opposite in 
sign with those of b^ and b ? respectively; but interestingly enough 
the adaptive gains are adjusted accordingly so that the values of the 
O.L.F.O. control sequence and the trul y optimal control sequence are almost 
the same . This set of experiments indicates yet slower convergence (if 
there is any). 

Note that in both sets of experiments even if the estimate of b does 
not co nverge to the true b, the truly optimal trajectory and O.L.F.O. 
trajectory a re almost the same after the transient period. 

Intuitively, the results are reasonable. Since we have not told the 
problem to identify _b, it will not do so unless the identification is 
absolutely necessary as to conserve control energy. The experimental results 
verified our theoretical deduction of Section 7. 

The experiments seem to indicate that for stable system, the choice of 
initial guess will not greatly influence the O.L.F.O. trajectory, but will 
a ffect the convergence rate for the estimate in the gain parameters, b. 

Remark: In each set of experiments discussed above, the number of 

sample runs is not enough to enable us to draw specific statistical con- 
clusions; yet the regularity in the sample runs enable us to draw some crude 
conclusions . 

From the simulations, we may draw the following conclusions which agree 
with the theoretical predictions regarding the O.L.F.O. control system. 

Cl) The rate of convergence seems to be very dependent on the 

stability of the system. For unstable systems, the convergence 
rate seems to be faster compared to that for stable systems. 


46 



(2) It seems that large controls will help identification of the 
unknown gain parameters, and so convergence rate seems to relate 
directly to the magnitude of the control action. 

(3) For unstable systems, the rate of convergence seems to be 
fairly independent of the initial guess on the unknown gain, 

whereas for stable systems, the convergence rate may be quite 
dependent on the initial guess on the unknown gain. 

(4) For unstable systems, the O.L.F.O. trajectory will depend on the 
the initial guess in b^, but then for stable systems, the O.L.F.O. 
trajectory will not vary drastically when we vary the initial 
guess in b^. 

(5) For the unstable system, the O.L.F.O. trajectory seems to follow 
closely its input-free trajectory in the beginning, until the 
diverging phenomenon tells the identifier to send back large 
controls for identification purposes. This causes some overshoots 
in the trajectory. The magnitude of the maximum overshoot seems 
to relate inversely with the values for the weighting constant h 
on control. For stable systems, simultaneous identification and 
control seem to be carried out in the beginning. Since the system 
is 8 table, with little control energy, the state will go to zero, 
so after some time period 4 when the state is near the origin, 
approximately zero control is applied thus terminating the 
identification of jb. 

(6) Lastly, we sould like to comment on the computational 
feasibility of the proposed scheme. The above experiments were 
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simulated using an IBM 360/64/40 system. It was found that the actual 
computation of the O.L.F.O. control sequence can be carried out almost 
in real time for N = 40; i.e. in about 0.2 second, the following tasks 
were accomplished: One step computation of (4.19) - (4.23) (6 vector 

difference equation and 6x6 matrix difference equation), the parameter 
computations (5.3) - (5.6), and the computation of K(k|k) (5.2), £(k) 
(5.8) (one 12 x 12 matrix difference equation and one 3x3 matrix 
difference equation, computed in a time-backward direction directly 
for k £ 40 steps, k = 0,1, . ♦ . ,N-1) . 


\ 
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11. CONCLUSIONS 

A technique for adaptive control for a class of linear systems with 

unknown gain parameters has been presented. Simulation results have verified 

qualitative theoretical predictions. 

The technique proposed is more general than that proposed by prison at 
[3 1 

al- * since they enforced separation so that the control gains are not adjusted 
by the uncertainty (covariance matrices) of the parameter estimates. It 
differs from that proposed by Murphy Gorman and Zaborszky ^ , Bar- 

Shalom and Sivan^, and Florentin^ in the sense that these stated or 
developed techniques to approximate Bellmans equation. The paper by Bar- 
Shalom and Sivan^ did propose an O.I.F.O. approach to the problem but no 
detailed derivations were carried out; thus, one could not deduce qualitative 
properties of the adaptive system. 

12. ACKNOWLEDGMENT 

The authors are indebted to the comments and criticisms provided by 
Professors F. C. Schweppe, I. B. Rhodes, and T. C. Willems of M.X.T. 


4. W. J. Murphy, "Optimal Stochastic Control of Discrete Linear Systems 
with Unknown Gain," IEEE Trans, on Automatic Control , AC-13, August 
1968. 

5. D. Gorman and J. Zaborszky, "Stochastic Optimal Control of Continuous 
Time Systems with Unknown Gain," IEEE Trans, o n Automatic Control , 
AC-13, December 1968. 

6. Y. Bar-Shalom and R. Sivan, "On the Optimal Control of Discrete-Time 
Linear Systems with Random Parameters,” IEEE Trans, on Auto matic Con- 
trol , AC-14, February 1969. 

7. J. J. Florentin, "Optimal, Probing, Adaptive Control of a Simple 
Baysian System" J. Electronics and Control , Vol. 11, 1962. 

8. R. Bellman, Dynamic Programming , Princeton University Press, Princeton, 
New Jersey, 1957. 

9. S. E. Dreyfus, Dynamic Programming and the Calculus of Variations, 
Mathematics in Science and Engineering, Academic Press, 1965. 

10. M. Athans, "The Matrix Minimum Principle," Informat ion and Control, 

Vol. 11, December 1967. 

11. A. Levis, "On the Optimal Sampled-Data Control of Linear Processes, 
Ph.D. Thesis, M.I.T., June 1968. 

12. E. Tse and M. Athans, "Optimal Minimal-Order Observer-Estimators for 
Discrete Linear Time-Varying Systems," IEEE Tra ns, on Automatic Con~ 
trol , AC-15, August 1970 

13. E. Tse, "On the Optimal Control of Linear Systems with Incomplete 
Information," Report ESL-R-412, M.I.T., January 1970. 

14. D. L. Kleinman and M. Athans, "The Discrete Minimum Principle with 
Application to the Linear Regulator Problem," Report ESL-R-260, M.I.T., 
February 1966. 

15. D. Sworder, Optimal Adaptive Control Systems , Academic Press, 1966. 


13. REFERENCES 


1. M. Aoki, Optimization of Stochastic Systems , Mathematics in Science 
and Engineering, Vol. 32, Academic Press, 1967. 

2. A. A. Fel'dbaum, Optimal Control Systems . Mathematics in Science and 
Engineering, Vol. 22, Academic Press, 1965. 

3. J. B. Farison, R. E. Graham, and R. C. Shelton, "Identification and 

Control of Linear Discrete Systems," IEEE Trans, on Automatic Control. 
AC-12, August 1967. " 


49 


50 



APPENDIX C 


PROOFS ON ASYMPTOTIC BEHAVIOR 


Proof of Theorem 8.2 


By (A. 17) and (4.20), we have 


C(k+l)4 A (k,k) : C(k+l)u(k) 

; k+j+i : 

C(k+j)jL(k+j-l,k) t V C(k+j)4 A (k+j-l,m)u(jO^ _(£-l,k) 

; * £» k : A G 

: k+2v-2 ' 

C(k+2v-l)£ (k+2v-2,k)! £ £(k+2v-l)^(k+2v-2,£+l)u(£)j^ (£-l,k) 


By assumption, the first mv rows of vectors contains at least n independent 
vectors. Among the rows vectors £(k + v + j)£ A (k + v + j - l,k), let 

4 (l) (k + v + j) 4 (k + v + j " *» k >* ••• 4 < v / k + v + J>± A < k + v + 3 “ i.k). 

be the vectors which are independent of the row vectors: 

C(k + v)£ A (k + v - l,k), C(k + v - l)l A (k + v,k), . . . C (k + v + j - 1) 

^ A (k + v + j - 2 ,k) , j =1, ...» v - 1; where 


'c^(k + v + j) - ] 

+ v + j) - : 

c'Ck + v + j)J 


and Pj(*) is some permutation of (1, 2, m}. Since { (A(k) ,£(k ) ) }^q 

is uniformly completely observable of index v, it follows that ^ 0, 
i • 1 v - 1, and that 


m + v _ + v 0 + . . . v , « n 
1 2 v— 1 
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Assume that we have the dependence 

v+j-1 


Sp (s) (k+v+j)i A (k+v+j " 1,k) = 2 ,s)C(k+i)£ A (k+i-l,k) ; 1 i s < Vj (C.4) 


where the only possible nonzero entries of a^(j,s), 1 ■ 0* ...» v + J - l, are 

those corresponding to independent rows of £(k + i)j^ A (k + i - 1 >k) , i ■ 0, 

’•••» v + j-1. If there exists no c^(j,s), i * 0, ...» v + j - 1, which 

bears the relation (C.4), then the (m(v + j - 1) + p(s))th row vector of 

Mi -(k,2v) is independent of the first m(v + j - 1) row vectors. If there 

exists a^(j,s), i “ 0, . .., v+j -1 which gives the dependence (C.4)^ 

then such a dependence is unique by construction. Now assume that the 

(m(v + j - 1) + p(s))th row vector of Mi =(k,2v) is dependent on the first 

A, 

m(v + j - 1) row vectors, then we must also have the dependence 


K (s) ( ^ + \H*j)^ A (k+v+j-l,£+ 1 )u(£)^g ( £-l,k) 


VTJ-J. XV, O J. 

X^ (j,s) iC £( k+i )i A ( k+i - 1 » £+1 ) u <^iG (£ " i,k) 

i*l £=k 

Since A(k) is nonsingular, by (C.4) we have 
X+j-1 k+v+j-1 

Y Y i^(j,s)£(k+i)i A (k+i-l.m)u(!Oi G ()l-l,k) - 
i-0 A=k+i 


“ A -1 (i)A -1 (i +1) ... A -1 ( j ) ; i > j 


Since { (A (k) ,£(k) ) is uniformly completely observable, the vector 


a?(j>s) S [a!(j.s) ... : o!(j.s>] 


cannot be the zero row vector, s = 1, ...» v^. By assumption £(k) is 
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can 


nonsingular, therefore (C.6) is true if and only if u(k + i) * 0, i 0, 

1 8 j which is a contradiction. This result applies for s =* 1, .. . , v . ; 

• J 

j « 0 9 1* . v - 1. Together with (C.3) and the remark made at the 

beginning of the proof, we have that Mr =(k,2v) will have rank 2n if 

A»L> 

u(k + i) #0, i « 0, 1. v - 1. The theorem follows from the assumption 

that u(k) $ 0, k, =* 0, 1, . 

Proof of lemma 8.4 

From (A. 23) and (4.21), since N(k) * £, we have 


^(fc+lIk+l.VCO.t)) - G(k)j^(k|k,D(0,k-l)G' (k)-[0:iJV*(k+l|k,U(0,k)) 

(cCk+Ditklk.nCQ^k))!’ (k+i)+fi(k+i))v*’ (k+i|k,u(o,: 

share V*(k+l!k, C CO, k)) satisfies (4.21) - (4.23), using (8.2), (8.3) 



(C.9) 


follows immediately from (C.9). 


Proof of Theorem 8.5 (Main Result) 
Let e > 0 such that 


j |r b (k+2v|k+2v,U(0,k+2v-l)) - j^(k|k,U(0,k-l))| | i e (C.10) 1 

'where ||*|| is the spectral norm. Since X^(k|k,U(0,k-l)) i 0, k . ■ 0, 1, 
(8.3) and (C.10) imply that we have the inequality 


- l b (k+j-l|k+j-l,U(0,k+j-2)|| < e 

j - 1, 2, .... 2v . (C.ll) 

Using equation (C.9), we have 

s > ||(o:i ]v®(kfj|k+i-i,n(o,k+j-i)*(c(k+j)i(k+j-i|k+j-i,o(o,k+j-i))* 

““e’TS 

C’(k+3) +a(SH^))X* , '( k +j|k+j-l.U(0,k+j-l))f.T] II . (C.12) 


By corollary (8.3), . C(k+j)A(kfj-l|k+J-l,0(0,k+3-l))C'(k+j) + fi(k+j) 
be uniformly bounded,, sa 

I |(C(k+j)A(k+3-l|k+j-l,U(0,k+j-l))CXk+j)+a(k+j))V*' (k+j |k+j-l,U(0,k+j-l))j 

1 

I I (C(k+j)A(kfi-l|fcki-r,JI(0,k+j-l)C'(k+j)+a(k+j)) 2 | | • 1 1 (C(k+j)i(k+j-l|k+j-l,U(0,k+j-3)) 

£' (k+J)+a(k+j))V t (k+j|k+j-l, U(0, k+j-1) J.tI I I 


/ei fi (e) j - 1, 2, .... 2\> (C.13) 

* J J 

6^(e) is continuous in. e and 6^(e) + 0 as e 0, j 88 1, v. Using 

(4.21), (C.13) can also be written as follows 

I IcOcfjJiOt+j-DI^Ck+j-lIk+j-l.UW.k+j-a))^ (k+j-l)+u(k+j-l)C (k+j)- 

^ (k+j-1 1 k+j-L, U (0 , k+j - a £ ’ (k+j-1) 1 1 £ ^ (e) 

j - 1, .... 2u . (C.14) 


Since V*(k+j | k+j-1, U(0, k+j-1)) is bounded for j = 1, .... 2, therefore 
(C.12) and (4.21) imply that 

1 1 lO:i n II*(k+i|k+l-i,U(0,k+j-l))C(k+j)£(k+j-l|k+j-l,0(0,k+j-l)) 

(C.15) 



no* :0]V* (k+j I k+j-1 ,U (0 , k+j-1) ) C (k+j ) i (k+j -1 1 k+j -1 ,U (0 ,k+j -1) ) 

s Bj(e) (C.16) 

where is continuous in e, 6^(e) + 0 as e +0, i ■ 1, 2, . By using 
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(4.23), (C.15) and (C.16) and the assumption that G(k) is nonsingular, the 
inequality (C.14) implies 



where f^(e) is continuous in e, f^(e) ■* 0 as e -» 0, i » 1, 2, .... 2v. 
Equations (c.17) and (C.18) imply 

Xb (k+1 l k+ l, U(0 » k >>' 

I^A,C (k+1 * 2v) IMf(e) (c .i9) 

(k+1 ( k+1 , U (0 ,k)) . 


where f(e) Q when e •+ 0 and is continuous in e. By theorem 8.2, 

+ I > Zv ) is of full rank, so we have 

t , V (0 , 1c) ) | j < S'Ce) <S'( s) «*• C as e + 0 (C.20) 

| |^(k+i|k+l f U(0,k)) I [ < 6"(e) 5 M (e) + 0 as e + 0 . ( C .21) 

Bow the conclusion of the theorem follows from (8.4)* 
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